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Abstract 

Simultaneous recordings from multiple neural units allow us to investigate the activ- 
ity of very large neural ensembles. To understand how large ensembles of neurons 
process sensory information, it is necessary to develop suitable statistical models 
to describe the response variability of the recorded spike trains. Using the infor- 
mation geometry framework, it is possible to estimate higher-order correlations 
by assigning one interaction parameter to each degree of correlation, leading to a 
(2 N — l)-dimensional model for a population with N neurons. However, this model 
suffers greatly from a combinatorial explosion, and the number of parameters to be 
estimated from the available sample size constitutes the main intractability reason 
of this approach. To quantify the extent of higher than pairwise spike correlations 
in pools of multiunit activity, we use an information-geometric approach within the 
framework of the extended central limit theorem considering all possible contribu- 
tions from high-order spike correlations. The identification of a deformation param- 
eter allows us to provide a statistical characterisation of the amount of high-order 
correlations in the case of a very large neural ensemble, significantly reducing the 
number of parameters, avoiding the sampling problem, and inferring the underlying 
dynamical properties of the network within pools of multiunit neural activity. 
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1 Introduction 



To understand how sensory information is processed in the brain, we need 
to investigate how information is represented collectively by the activity of a 
population of neurons. There is a large body of evidence suggesting that pair- 
wise correlations are important for information representation or processing 
in retina [Tp] . thalamus [3] and cerebral [lf5|6] and cerebellar cortices 08]. 
However, there is also evidence that in at least some circumstances, pairwise 
correlations do not by themselves account for multineuronal firing patterns 
[9lfl0] ; in such circumstances, triplet and higher-order interactions are impor- 
tant. The role of such higher-order interactions in information processing is 
still to be determined, although they are often interpreted as a signature of 
the formation of Hebbian cell assemblies (TTJ . 

Higher-order correlations may be important for neural coding even if they 
arise from random fluctuations. Amari and colleagues [12J have suggested 
that a widespread distribution of neuronal activity can generate high-order 
stochastic interactions. In this case, pairwise or even triplet-wise correlations 
do not uniquely determine synchronised spiking in a population of neurons, 
and high-order interactions across neurons cannot be disregarded. Thus, to 
gain a better understanding of how neural information is processed, we need 
to study, whether higher-order interactions arise from cell assemblies or from 
stochastic fluctuations. Information-geometric measures can be used to anal- 
yse neural firing patterns including correlations of all orders across the popu- 
lation of neurons |13fl4fl5fl6fl7fl8|9|T0] . 

A straightforward way to investigate the neural activity of a large population 
of neurons is to use binary maximum entropy models incorporating pairwise 
correlations on short time scales [JJ2]. To estimate this model, one has to 
consider a sufficient amount of data to measure the mean activity of individual 
neurons and correlations in pairs of neurons. This allows us to estimate the 
functional connectivity in a population of neurons at pairwise level. However, if 
higher-order correlations are present in the data, and as the number of possible 
binary patterns grows exponentially with the number of neurons, we would 
need to use an appropriate mathematical approach to go beyond a pairwise 
modelling. This is in general a difficult problem, as sampling even third-order 
interactions can be difficult in a real neurophysiological setting [9fT0] . 




However, under particular constraints, this sampling difficulty can be sub- 
stantially ameliorated. An example of such a constraint is pooling, in which 
the identity of which neuron fires a spike in each pool is disregarded. Such 
a pooling process reflects the behaviour of a simple integrate-and-fire neuron 
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model in reading out the activity of an ensemble of neurons. It also reflects 
a common measurement made in systems neuroscience: the recording of mul- 
tiunit neural activity, without spike sorting. Whether or not such constraints 
permit a complete description of neural information processing is still a mat- 
ter of debate [6lfT0] . but they may allow substantial insight to be gained into 
the mechanisms of information processing in neural circuits. A method for the 
statistical quantification of correlations higher than two, in the representation 
of information by a neuronal pool, would be extremely useful, as it would allow 
the degree of higher - order correlations to be estimated from recordings of 
multiunit activity (MUA) - which can be performed in a much broader range 
of circumstances than "spike-sortable" recordings can. 

In this paper, we use a pooling assumption to investigate the limit of a 
very large neural ensemble, within the framework of information geometry 
[T2"f 13fl4f 15fl6fl7f 18fl9f2U] . In particular, we take advantage of the recent 
mathematical developments in g-geometry (and g-information geometry) and 
the extended central limit theorem [21 22.23 24 25 26J to provide a statistical 
quantification of higher-order spike correlations. This approach allows us to 
identify a deformation parameter to characterise the extent of spike corre- 
lations higher than two in the limit of a very large neuronal ensemble. Our 
method accounts for the different regimes of firing within the probability dis- 
tribution and provides a phenomenological description of the data, inferring 
the underlying role of noise correlations within pools of multiunit neural ac- 
tivity. 



2 Methodology 

2. 1 Information geometry and the pooled model 

We represent neuronal firing in a population of size iV by a binary vector 
X = ( Xij . . . , Xff), where Xj = if neuron i is silent in some time window 
AT and Xi — 1 if it is firing (see Fig [Tla). Then, for a given time window, 
we consider the probability distribution of binary vectors, {P(X)}. Any such 
probability distribution {P(X)} is made up of 2 N probabilities 



P(X) = Prob{X 



i 




in} = Pi 




subject to the normalization 



E 



p. 



ll-.-lN 



1. 




)»JV=0,1 
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As proposed by Amari and co-workers (see, for instance, Refs [20lfT5] ). the 
set of all the probability distributions {P(X)} forms a (2 — l)-dimensional 
manifold S^. This approach uses the orthogonality of the natural and expec- 
tation parameters in the exponential family of distributions. It is also useful 
for analysing neural firing in a systematic manner based on information ge- 
ometry. Any such probability distribution can be unequivocally determined 
using a coordinate system. One possible coordinate system is given by the set 
f 2" — 1 marginal probability values: 



f} i = E[x i ] = P{x i = l},i = l,...,N (3) 

fjij = E[xiXj] = P{xi = xj = 1}, % < j (4) 

77123..JV = E[xt...x N ] = P{x x = x 2 = ... = x N = 1}. (5) 

These are called the 77-coordinates [20]. Moreover, provided P(X) 7^ 0, any 
such distribution can be expanded as in Ref. [T5j 



P(X) = exp < x i®i + XiXjOiji- (6) 
+ XiXjXkOijk + ... + x 1 ... x N 8i_ N - tjj > , 

i<j<k ) 

where there are in total 2^ — 1 different 9 correlation coefficients that can 
be used to determine univocally the probability distribution. The 9 forms a 
coordinate system, named 9— coordinates, which correspond to e-flat structure 
in the (2^ — l)-dimensional manifold Sjy- In Eq. ([6]), ip is a normalization term. 
The ^-coordinates and 9 coordinates are dually orthogonal coordinates. The 
properties of the dual orthogonal coordinates allow the formulation of the 
generalised Pythagoras theorem that gives a decomposition of the Kullback 
Leibler divergence to calculate contributions of different orders of interaction 
between two probability distributions. 

These notions are rigorously developed in Refs. [12lll4|15|16lll7fl8fl9f20j within 
the framework of information geometry. Assigning one "interaction parame- 
ter" 9 to each degree of correlation, we have a (2^ — l)-dimensional model 
for a population with iV neurons. The basis of this formalism is shown in 
FigO: once our coordinate 9 is fixed we have a subset E{9) for each possible 
value of 9 and an exponential family of distributions (with the same value 
of 9). Notice that when higher-order correlations are considered, we need to 
construct an orthogonal multidimensional coordinate space to build the prob- 
ability distribution, as is schematised in Fig[Ti>- Let us consider M(fji, ...,fjjf) 
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to be a submanifold of the marginals in Sn- Then, the tangential direction 
of M(fji, 77jv) represents the direction in which only the pure correlation 
changes, while the tangential directions of E(9) span the directions in which 
only fix, f^jv change but 9 is fixed. Directions of changes in the correlations 
and marginals are required to be mutually orthogonal. 

It is important to point out that the estimation of all the parameters as- 
sociated with higher-order correlations suffers greatly from a combinatorial 
explosion. Consider, for instance, a population of 50 neurons for which we 
would need therefore more than 10 15 parameters. Thus, the number of param- 
eters to be estimated from the available sample size constitutes the reason for 
the intractability of this approach. 

However, if we assume that a neuron cannot process spikes from different neu- 
rons separately, then the labels of the neurons that fired each of the spikes 
are lost. In this case, the target neuron is only aware of the number of syn- 
chronous firings among its inputs. Similarly, a neurophysiological recording 
technique that disregards the identity of the neuron that fires each spike (i.e. 
it measures MUA) can only count the number of cells in the vicinity firing in 
a time window, rather than provide information about their pattern. 

To investigate the effects of such processes, we consider a pooled model [321X2], 
where we assume (for mathematical convenience) a population of N identical 
neurons. Rather than the full distribution of X with probabilities P{X) consid- 
ered above, we now introduce a set of probabilities P(k) where k = 0, 1, . . . , N 
represents the number of synchronous spiking neurons in the population dur- 
ing a time interval AT. 

This assumption greatly simplifies the analysis based on the coordinate sys- 
tems described above. The probability distribution is now characterised by 
only N parameters. Following Ref. p], we refer to the probability values P(k) 
as the p-coordinates (the pooled model is defined by N parameters). Then, 
considering the number of neurons that are firing simultaneously (within AT) 
in the pooled model, instead of the probability distribution Eq. (jBJ), one has 

P(l) = iVexp(0i-V(0)), (7) 

and 



P(k)=^j expf^ + ^Q e i -i?{e) > j,k = 2 1 ...,N (8) 

while -P(O) = exp(— ip{9)) corresponds to a completely silent response and is 
written in such way so as to fulfil normalization of the probabilities {P(k)}. 
The marginals indicating the probability of any k neurons firing within 
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AT, are given by r, k = £ with fc = 0, 1, . . . ,N [13] . In Eqs. © 

and (jSJ), #i corresponds to the first order contribution to the log probabilities, 
while each 4 (i — 2, . . . , N) represents the effect on the log probabilities of 
interactions of order i in the neuronal pool. We will deal with the relative 
probability of simultaneous spiking. The ratio between P(k) and P(k — 1) is 
given by the following expressions: 



P(0) " 
P(2) N 



iVexp(0a) 
- 1 



PI 



exp(^ + 9 2 ), 



(9) 
(10) 



and if k > 2 



P{k) N-k + l / ^ {k-l)\ Q Q \ ni , 

exp [Ox + V - \.. n J ... Bi + e k ). (11) 



P(k-l) k \ -!)!(*-*)! 



This approach has been applied in a previous publication [9] within the maxi- 
mum entropy (ME) principle to evaluate whether high-order interactions play 
a role in shaping the dynamics of neural networks. By fixing the correlation 
coordinate 9k+\ = in Amari's formalism, the ME constraints are set up to 
order k. This is, the ME principle is archived by simple fixing {9i}i>k = and 
the distribution is determined by {8i}i<k- The constraints of the ME principle 
allow us to choose the most random distribution subject to these constraints 
[T5|9] . Thus correlation structures that do not belong to the constrained fea- 
tures are simply removed. Let us consider for instance ME constraints at pair- 
wise level that are set by fixing {6 l j} i>2 = 0, therefore all correlation structures 
of order higher than two are removed. 



In a recent paper, Yu et al [27] have shown that higher - order correlation struc- 
tures are quite crucial to get a better understanding of how information might 
be transmitted in the brain. They have reported the importance of higher 
- order interactions to characterise the cortical activity and the dynamics of 
neural avalanches, exhaustively analysing how the Ising like models 128 1112129] 
do not provide in general a fair description of the overall organization of neural 
interactions when considering large neuronal systems. 



2.2 A simple binomial approach 



In order to get a better understanding of how to take the limit of a very large 
number of neurons, we first discuss a quite naive approach using Bernoulli 
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trials and a binomial distribution. Despite its simplicity and limitations, it 
helps to provide some intuitive feedback. 

Let us consider the activity of a neuronal population of N neurons in a specific 
time window AT. It should be noted that P(k) denotes the probability of k 
cells firing simultaneously and N — k being silent, with < k < N. Let us 
denote by A the mean number of spiking cells, then A = Ep[k] = Y,k=o kP(k) = 
N p where we introduced p = -4. 

If we assume that members of a population of neurons spike independently and 
with a fixed, constant probability, then the spiking activity of the population 
can be modelled as Bernoulli trials. For a population containing N neurons, 
each with a firing probability p, the probability P(k) of having k neurons firing 
simultaneously is given by the binomial distribution: P(k) = b(k;N,p) = 
\k)P k ^ ~ P) N ~ k ■> with k = 0, 1, . . . , N. In the limit of large N and small p, 
such that A = Np is finite, one has the so-called Poisson regime [3U]- In this 
case, the following relations hold for k > 1: 



b(t,N,p) _A 



b{k-l;N,p) k y N 
in the particular case fc = lwe have 

rn^p) _A+0( iv ) ' (13) 

while for k = the distribution in the Poisson regime reads 

b(0;N,p)=exp(-\) + O(^). (14) 

We now introduce these ansatze in Eqs. fl9|)- f|TT|) . The ratio between the dis- 
tribution one would derive in the absence of knowledge of spike correlation, 
P(l), and the distribution one would obtain when no spike is being fired, -P(O), 
is given by = Ne dl . Let us consider the limit iV — > oo with A = N p finite 
(p is a small quantity). Then, comparing Eq. ffT3]) with Eq. (Q, we can write 
A = Ne dl +0(jj), which leads to 




(15) 



Notice that in this regime, using Eq. ( !T4|) . the probability distribution P(k) 
at k = behaves as P (0) = = e~ A + O(^), which gives ip approximately 
equal to A for the normalizing factor. 
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Introducing Eq. (fT5|) into the probability of k neurons firing, Eq.©, as 

y ( 

lim iV(Af " 1) ;f" (fc+1)) = 1, in the large N limit, we get P(k, A) « §e"^e - 2 
Summing up, if a neuronal population behaves in the large N limit such that 

V>(0) ~ A (16) 

and accomplishes Eq. (fl5l) then the probability of cells firing simultaneously 
can be expressed as a 5-deformed Poisson distribution such as: 



\k p -X J(k) 

iMM)« fc! e , (17) 



where 5(fc) = ^ * s a deformation term that summarises correlations of 
all orders. 



Eq. (TT7|) corresponds to a Poisson distribution modulated by a multiple- 
correlation : e s ( k \ Notice that in the analytical approach presented above 
we have taken the limit of iV — > oo, but if only correlations up to order two 
dominated the neuronal spiking, the departure from a Poissonian behaviour 
in the probability of k (> 2) firing neurons would be small. Indeed, applying 
Amari's prescription [12], we obtain 8(2) =02 = O(j^) and then e s ^ k > is almost 
unity. But our derivation comprises all orders through the factor e s<yk \ 

It is important to note that we consider the mean number of spiking cells to be 
equal to the normalization factor (see Eq. (TT6|) ). This assumption represents 
only a possible subset of the probability distributions {P(X)} presented in 
Eq. OH]). Thus, in the limit of a very large number of neurons, and if higher 
- orders are taken into account, we cannot provide an analytical estimate of 
the entire set of all possible widespread distributions using the approach of 
Eq. (02). 

More importantly, due to the high dimensionality involved in determining 
Eq. (EJ), or even in our approach of Eq. (Tl7|) . we have to take just a few 
correlation terms to avoid having an intractable number of parameters [T^]|9] . 



But the more basic limitation of the above approach is due to the fact that 
taking a limit of a very large number of neurons or "thermodynamic limit" 
essentially means to account for the central limit theorem of statistics (CLT). 
The CLT articulates conditions under which the mean of a sufficiently large 
number of independent random variables, each with finite mean and variance, 
can be considered as normally distributed [31] . In the next sections we discuss 
how to take the limit of a very large number of neurons accounting also for 
higher-order correlations in the probability distribution. Importantly, we con- 
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sider the case of any probability distribution of firing, and do not just simply 
model the neural activity through a binomial distribution (or Bernoulli trials). 
We make use of an extension of the central limit theorem that also accounts 
for the case of correlated random variables [24J. This is particularly important 
since it allows us to extend analytical models accounting for effect the of high 
- order correlations on the PDFs, making it possible to compute the emergent 
properties of the system in the "thermodynamic limit" . 



2.3 q- Information geometry 



Recent studies on information geometry and complex systems have found a 
big number of distributions that in the asymptotic limit obey the power law 
rather than the Gibbs distribution [2"0"|32|12||2"3 33j. These power law distribu- 
tions seem to rule the asymptotic regime, and the g-exponential distributions 
used in non-extensive statistical mechanics are very useful for capturing such 
phenomena [32, 24, 22, 23, 25, 26, 20, 21J. The geometrical structure of such prob- 
ability distributions is termed g-geometry, and its mathematical foundations 
have been developed by Tsallis, Gell-Mann, Amari, and collaborators. They 
have in particular proved that the g-geometrical structure is very important 
to investigate systems with weakly and strongly correlated random variables 
l32lF2^1f22lf23ll25lf26lf20lf2T1. 



In particular, recently Amari and Ohara [2T] proved that it is possible to 
generalise the g-structure to any family of probability of distributions, and 
that the family of g-structure is ubiquitous since the of family of all probability 
distributions can always be endowed with the structure of the g-exponential 
family for an arbitrary g. 



The g-exponential is defined as 
exp g (x) = 




q){x)]— if (1 + (1 -q)(x)) > 0; 
otherwise 



where the limiting case of q — > 1 reduces to exp 1 (x) = exp(x). The q- 
exponential family is defined by generalising Eq. (jHJ) (for a review see {21] ) 

as 

P,(x,6) = exp 9 {e-a;-^(e)}, (19) 
where in particular the g-Gaussian distribution is defined as p(x, fi, a) = 

exp (— ^(/-^ °")) ( see Appendix, Section B, for a connection with the 

correlation coordinates). 

In analogy with the exponential families the g-geometry has a dually flat 
geometrical structure (as its coordinates are orthogonal) and accomplishes 



9 



the g-Pythagorean theorem. The maximiser of the g-escort distribution is a 
Bayesian MAP (maximum a posteriori probability) estimator as proved in 
Ref. [21] • Moreover, it is possible to generalise the g-structure to any family of 
probability distributions, because any parameterised family of probability dis- 
tributions forms a submanifold embedded in the entire manifold. Altogether, 
we can introduce the g-geometrical structure to any arbitrary family of prob- 
ability distributions and guarantee that the family of all the probability dis- 
tributions belongs to the g-exponential family of distributions for any g (21]. 
We refer to Fig [2] for a schematic explanation of the g-exponential family. 

Next, we take advantage of recent mathematical progress on g-geometry (and 
g-information geometry) to investigate the effect of high - order correlations 
on the probability distribution in the asymptotic limit. 



3 Results 

3.0.1 Beyond pairwise correlations 

In statistical mechanics it is said that we reach the "thermodynamic limit" 
when the number of particles being considered reaches the limit N — > oo. The 
thermodynamic limit is asymptotically approximated in statistical mechanics 
using the so-called central limit theorem (CLT) [3T]. The CLT ensures that 
the probability distribution function of any measurable quantity is a normal 
Gaussian distribution, provided that a sufficiently large number of independent 
random variables with exactly the same mean and variance are being consid- 
ered (see pages 324-330 [31] )• Thus, the CLT does not hold if correlations 
between random variables cannot be neglected. 

Thus, the CLT articulates conditions under which a sufficiently large number 
of independent and identically distributed random variables, each with finite 
mean and variance, can be considered as normally distributed [31]. In particu- 
lar, the CLT has been used by Amari and colleagues [12] to estimate the joint 
probability distribution of firing in a neuronal pool considering the limit of a 
very large number of neurons. Thus, in their approach pairwise correlations 
within the joint distribution of firing are quantified through the covariance 
< UiUj > of the weighted sum of inputs £7, and Uj of two given pairs of neu- 
rons (i ^ j, i = 1...N and j = 1...N ) [12J. This is U, = ££=i w {j - H, where 
Wij is the connection weight from the j th input to the i th neuron (H = E[Ui\ 
denotes the mean). Importantly, these Ui are being considered Gaussian due 
to the CLT [T2], and thus the approach quantifies the amount of pairwise 
correlations through the covariance < UJJj >. 

In the presence of weak or strong correlations of any sort, the CLT has been 



10 



generalised in recent publications by M Gell-Mann, C Tsallis, S Umarov, C Vi- 
gnat, A Plastino (see:[22,23 ; 24 ; 25 ; 26j). They have proved that when a system 
with weakly or strongly correlated random variables is being considered, if we 
gather a sufficiently large number of such systems together, the probability 
distribution will converge to a g-Gaussian distribution [22|23|24|25f26j . This 
is in agreement with the theorems recently proved by Amari and Ohara [2T] . 
which permit the introduction of the g-geometrical structure to any arbitrary 
family of probability distributions, and guarantee that the family of all the 
probability distributions belongs to the g-exponential family of distributions. 

We will use the "natural extension" of the central limit theorem (ECLT) pro- 
posed in [24J, which accounts for cases in which correlations between random 
variables are non- negligible. This results in so-called g-Gaussians (instead of 
Gaussians) as the PDFs in the ECLT, as proved in Ref . [2~4] : 



where q is a (problem-dependent) positive real index. Notice that in the limit 
of q = 1 a normal Gaussian distribution is recovered as lim A r_ >00 (l + -i:) Ar = e, 



which can be rephrased as lim 9 _>i(l + (1 — g))^-?) = e. In other words, the 
CLT is being recovered as q ->■ 1 [22H23ll^1l25lf26] . 




Let us now consider the probability of exactly k = N ■ r (and thus r = 
jj) neurons firing within a given time window AT across a population of N 
neurons. In the framework of the pooled model we have: 
Pr[r = jq] = Pr{x\ = x 2 = ■ ■ ■ = x k = 1, Xfc+i = . . . = x N = 0}, where 
the neuron Xj is subject to a weighted sum of inputs Ui, thus Xi — 1 if and 
only if Ui > and x,i = if u^ < 0. Following [TjZ], the neuronal pool receives 
common inputs Si,s 2 , (as schematised in Fig [3]), and u^ is weighted 
by the common inputs U{ = Y^fLi w ij s j ~ h, where tOy are randomly assigned 
connections weights. These Ui are g-Gaussian due of the ECLT [22, 23,24,2 5126] . 
Considering that the Ui are subject to a g-Gaussian distribution N q (—h, 1), we 
define in analogy to [12] Ui = \/l — a Vi + \/ae — h, for i = 1, .., N. We take 
a = E q [uiUj] as a q- variance, h = E q [ui] as the g-mean, and two independent 
g-Gaussian random variables Vi and e subject to N q (0, 1) (see [M] for a detailed 
description of g-Gaussian random variables). 

In the following we will use what is commonly referred to factorization ap- 
proach in statistical mechanics [351(36] . which is applicable in this case as we 
are considering weak correlations among neurons and the population of neu- 
rons is homogenous. We name E £ as the expectation value taken with respect 
to a random variable e, and P r {u > 0\e} is the conditional probability for e. 
This allows us to calculate the probability of having r = 4 neurons firing, 
separating the contribution of neurons that are firing [P r {u > 0\z}] k from 




(20) 




11 



those that are silent [P r {u < 0|e}] ], as 

P r {x 1 = x 2 = ■ ■ ■ = x k = 1, x k+ i = . . . = x N = 0} = 
E £ [P r {ui, u 2 ; . . . ; u k > 0, u k+1 ; ...;%< 0|e}] = 

N l v i (21) 



In order to go beyond the pairwise estimation of [12J (performed within the 
CLT framework), we need to quantify the amount of correlations higher than 
two in the probability distributions. If we take the limit of iV — > oo, in the 
framework of the g-Gaussian ECLT [22, 23 24. 3322] > instead of the Gaussian 
CLT as considered in H2I, we can define: 



F g (e) = P r (u > 0\e) = P r ( Ul > h ( 22 ) 

V 1 — ol 



roc 



If we take q = 1 in the probability of having r = neurons firing in Eq. 
and the distributions within the integral of Eq. f[2"2"j) corresponds to normal 
Gaussian distributions, and we are in the "CLT framework". On the other 
hand, the "ECLT framework" corresponds to q > 1 and in this case the system 
has weakly or strongly correlated random variables. Thus the distributions 
within the integral of Eq. (j22j) are considered as q-Gaussian distributions, and 
correlations are quantified through q. 

Notice that if we consider the limit of the CLT framework (q = 1), the previous 
expression reduces to 



F,^) - > fc (-Li-§) (23) 

where Erfc(x) = ^= f£° exp(—t 2 )dt denotes the complementary error function. 
However, if the effect of correlations higher than two is not negligible then 
according to the ECLT: q > 1, thus 



exp,(-y) = f ^y J o dtt— - 1 exp (-t-t(g-l)_), (24) 
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which is known as Hilhorst transform |37J, an integral representation widely 
used in generalised statistical mechanics [38]. Thus F g (e) reads 



1 Z" 00 Z" 00 i tP 1 

^) = f7X]i J^ 1 dtdvt—' 1 exp(-t + t(q-l)-). (25) 



Using several non-trivial identities between Gauss hypergeometric functions 
and the incomplete beta function, we can exactly calculate the integrals ex- 
pressed above (see Appendix Section A, for a detailed description of the math). 
And F q (e) in terms of a beta incomplete function reads 

where 

_ (q-l)(h- y^e) 2 

" 2{T^a) • (27) 

Eq. ( [2^]) allows us to calculate the probability (Eq. |2"T|) of exactly fc neurons 
firing within a given time window AT across a population of N neurons. 
Notice that the amount of correlations higher than two is quantified through 
q > 1, as when q is constrained to 1 it leads to F q=1 (e), which is reduced to 
the estimation within the CLT (Eq. [231 as in [12]. 

The joint distribution of firing can therefore be estimated as 



Q q (r)^NP r [r = ^} (21 



NE e { 



k 



The expectation value E e can be estimated using the saddle point approxima- 
tion [551121 



Q q (r) 



\r(l-r)\z»(e )\ 



-Lexp[A%(e ) - ^~ 
' Ztx £ 



(29) 



(1 — r) log( 1 fj^ 9 ). Within the saddle point ap- 



where z q (e) = r\og(^ 3 &] 

proximation: e = arg max £el L „ q K ^ n ^ ...... 

which implies r = F q (e ), where r goes between [0, 1] and e is defined for all 
real numbers. £o( £ ) (Eq. [27|) can take different values with a and h. Addi- 
tionally, Eq = F' 1 ^) depends on the degree of correlation of the network 
architecture, which is quantified by q. Fig [3] shows the behaviour of Q q (r) for 
q = 1.3 in comparison to q — 1. Notice that a higher degree of correlation, 



[^(e)] and = 0. The solution is e = F 
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q > 1, corresponds to a more widespread distribution. This is in agreement 
with the idea of Amari and Nakahara [12] that when taking a very large num- 
ber of neurons higher-order correlations are needed to reproduce the behaviour 
of a widespread distribution. 

Estimating Eq. (129]) for q > 1 requires a non trivial approach: finding e — 
F q 1 (r), and from Eq. (|26|) this means to estimate the inverse incomplete beta 
function of B i (-^-tA) (multiplied by the factor — ■ 1 , — r ). The inverse 

beta functions are tabulated in exhaustive detail, thus we can perform the nu- 
merical estimation of Eq. (129]) . It is important to note that the incomplete beta 

f w t s - 1 (l-i) s_1 dt 

function subroutines in matlab are normalised as I w (p,s) = — — b^Ts) > 

in which B\{p, s) = r^ff and therefore we should also multiply by Bi(p, s) 
to estimate Eq. (|26|) numerically. Notice from Eq. (|23|) that if we take q = 1, 
then F~\(r) is reduced to the estimation of an inverse complementary error 
function that is also a non-trivial mathematical operation. When q = 1 we are 
in the CLT framework and just estimating pairwise correlations, and we are 
potentially missing higher-order correlations. In contrast, F~ (r) allows us to 
quantify the amount of higher-order correlations as we are within the ECLT 
framework. It is important to point out that when q — 1, our previous findings 
reduce to the CLT limit estimations of Amari [12]. Thus, for real data, one 
can test for the presence of higher-order correlations by measuring the distri- 
bution of activity in multiunit recordings, and fitting q, which represents the 
amount of higher-order correlations present in the distribution of firing. One 
can show by simple comparison how statistically different from the q = 1 case 
the measured distribution is. 



In the next section (Experimental Results) we test the applicability of Eq. (I2"9l 
by measuring the spiking rate of multiunit activity in all non-overlapping win- 
dows of length AT. We fit the parameter q to find the best-fitting function 
Q q {r) in Eq. (|29l) for the experimental distribution. We then test the hypoth- 
esis of absence of higher-order correlation by comparison with a fit with q 
constrained to equal 1. 

In Section 13.21 we consider a network simulation model in which the number 
of interconnected neurons is a parameter under control. We then evaluate the 
hypothesis that Eq. (|29|) permits characterising the internal dynamics of the 
network for a spatio-temporal neuronal data set and quantifying the degree of 
higher-order correlations through q. 

A measure that is also particularly interesting in this context, since it was 
used in Ref. [3D] to give an information theoretic proof of the CLT, is Fisher 
information 
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im = L dT QM- (30) 

This measure is useful for detecting dynamical changes in the PDFs (i.e. a 
sharper probability distribution function would tend to have higher Fisher 
information than a more widespread PDF). Discretising Eq. ( 130]) as in Ref. 
[4Tj . using a grid of size M to calculate the distribution, one obtains 



1 M-l (Q _ Q )2 

HQ,) = 7 E ^ 7' • (») 



4 S (<?«+! + Q 



<l, 



This provides us with an information theoretic measure to quantify the dy- 
namical changes of the distribution. It is usually accepted when considering 
Fisher information and population codes of independent neurons that they 
encode information through bell-shaped tuning curves, and that the mean fir- 
ing rate of a neuron is a Gaussian function of some variable (i.e the external 
stimuli). In this case the slope of the distribution increases as the width of 
the tuning curve becomes smaller, and then a shaper distribution would have 
a higher slope and thus higher Fisher information. However, this would the 
case if noise correlations are independent across neurons and independent of 
the tuning width, as sharpening in a realistic network does not guarantee a 
higher amount of Fisher information [42J . 



3.1 Experimental results 



To provide an initial test of the applicability of this approach to the charac- 
terisation of higher-order correlations in pools of activity obtained from real 
neurophysiological data, we used a silicon microfabricated linear electrode ar- 
ray (NeuroNexus Inc., Michigan, USA) to record multiunit activity from the 
mouse barrel cortex (see Materials and methods). We used 5 minutes of spon- 
taneous spiking cortical activity of three adult mice, in which 16 electrodes 
were placed to record the multiunit neural activity (three data sets with 16 
different channels each). In the following, we apply our formalism to provide 
a statistical characterisation of higher-order correlations in pools of multiu- 
nit neural activity for each of the 16 different channels of the three data sets 
(thus in total 48 different recordings of 5 minutes in length), where we con- 
sidered four non-overlapping window lengths of AT = 25 ms, AT = 50 ms, 
AT = 100 ms, and AT = 200 ms. Our selection of these non-overlapping 
windows is related to the time windows typically used to investigate spike cor- 
relations and the firing rate distributions in Ref. [43|I44|I45] . The time windows 
of AT = 25 ms and AT = 50 ms are similar values to those used to investigate 
spike correlations in Ref. [HUB], and broader time windows of AT = 100 ms 



15 



and AT = 200 ms have also been used to investigate the firing rate distribu- 
tions jl5], and functional role of spike correlations [H]. The window length 
of AT = 25 ms is close to the values used in [2|1 to investigate the effect of 
spike correlations. 

We estimated the normalised firing distribution Q q (r), for experimental data, 
by measuring the spiking rate in all non-overlapping windows of length AT 
(here 25, 50, 100, and 200 ms). We then fitted the parameter q, representing 
the extent of higher-order correlations present in the neuronal pool, to find 
the best-fitting function Q q (r) (as in Eq. f )2T?]) ). Values of q equal to unity 
imply the absence of higher-order correlations in the system. Fig [5] shows the 
fitted q extracted for 3 different sets of 16 simultaneously recorded channels 
of multiunit activity, of increasing depth in the cortex (in 50 fim steps). For 
each set, the recordings were taken using different time windows AT = 25 
ms (panels a-c), AT = 50 ms (panels d-f), AT = 100 ms (panels g-i) and 
AT = 200 ms {j-l)- Notice that the maximum and minimum values of q 
corresponds to channel 15 in panel i and channel 12 in panel respectively, 
for time windows of AT = 100 ms and AT = 200 ms. 

In order to understand how well the model proposed in Eq. (12"§1) fits the 
experimental data, we compared it with a fit with q constrained to equal 
1. We define the normalised firing rate as r = J fc ^ , where (k) is the mean 
firing rate and N max is the maximum number of spikes. Figs [6] and [7] show 
the experimental spontaneous distribution of firing Q q {r) as a function of r 
(on logarithmic scale) for those channels with the maximum and the minimum 
values of q, respectively. That is, we use the estimated q and the parameters 
a and h that give the best fit for the distribution. The optimisation fitting 
criterion is the normalised mean squared error (NMSE), and the default error 
value is lower than 0.05 (p — value < 0.05). It is apparent that for all 25, 
50, 100 and 200 ms time windows (Figs [61T71 a.b.c and d, respectively), the 
theoretical curve is a remarkably good fit to the experimental distribution 
(p — value < 0.05 ). In contrast, the q — 1 curve does not come close to 
modelling the data satisfactorily over a wide range of firing rates. 

To detect the dynamical changes in the probability distributions, we estimate 
Fisher information as in Eq. (131 j) . On the one hand Fig [H] shows Fisher In- 
formation versus q for the entire data set. Fig [9]a shows Fisher information 
averaged over the 48 different channels, which becomes higher for smaller time 
windows. Notice from Fig O that when considering the case q — 1, Fisher 
information takes higher values for 50, 100 and 200 ms time windows than for 
those in which q > 1 is considered (Fig H?b). This is not the case for 25 ms 
time windows in which Fisher information takes a much higher value for q > 1 
than for those with q constrained to be equal to 1. As expected, Fisher infor- 
mation is more substantial at shorter time windows, where the fine temporal 
precision at which the spikes may synchronise has a significant effect. 



16 



Overall our findings show that spontaneous distributions of firing with q > 1 
should be considered to accurately reproduce the experimental data set. The 
main advantage of this method is that through the inclusion of a deformation 
parameter q, which accounts for correlations higher than two within the prob- 
ability distributions, we can quantify the degree of correlation and reproduce 
the experimental behaviour of the distribution of firing. 

If we considered single neuron trial to trial fluctuations for a fixed stimulus 
in spike count fluctuations, the extent of noise correlations would depend on 
the width of the tuning curves as correlations come up from common inputs 
|42j . Thus, in this case Fisher information would depend on the sharpening 
of the distribution and also on the noise correlation, and therefore a sharper 
distribution would not necessarily imply a higher amount of Fisher informa- 
tion. This effect is shown, for instance, in Fig[7Jd (AT = 200 ms) and Fig[Tlfr 
(AT = 50 ms); notice that d corresponds to a value of q smaller than b. The 
firing distribution of Fig U7H has a higher slope than the one presented in 
Fig O, but this higher slope does not have a correspondence with a higher 
value of Fisher information (Fisher info equal to 0.0150 bits in Fig [TR and 
Fisher Info equal to 0.0395 bits in FigUFb). 

Summing up, we presented a formalism that provides us with an estimate of 
the degree of correlation for the distribution of firing within pools of multiu- 
nit neural activity. This method allows us to naturally distinguish how far 
this distribution of firing is from the one we would obtain if each neuron con- 
tributes within CLT (q = 1). It permits us, therefore, to quantify the amount 
of correlation significantly reducing the number of parameters associated with 
the correlation coordinates. 



3.2 A Network simulation model 

To test further our theoretical approach, we apply our formalism to a network 
simulation model in which the number for interconnected neurons is a param- 
eter under control. We then analyse a simple network model in which neurons 
receive common overlapping inputs as in Fig [31 considering that each neuron 
can be interconnected randomly with more than two neurons. This will allow 
us to also test the hypothesis of Amari and collaborators that weak higher- 
order interactions of almost all orders are required to realise a widespread 
activity distribution of a large population of neurons [T2] . 

The network simulation we use is the one developed in |46j, which consists 
of cortical spiking neurons with axonal conduction delays and spike timing- 
dependent plasticity (STDP). Each neuron in the network is described by the 
simple spiking model [UJ 
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v ' = OMv 2 + 5v + 140.U + I 
u' = a(bv — u) 



(32) 
(33) 



with the auxiliary after-spike resetting 



ifv > +30mV, then 



v <— c 



(34) 



u <— u + d 



Where v is the membrane potential of the neuron, and u is a membrane 
recovery variable, which accounts for the activation of K+ ionic currents and 
inactivation of Na+ ionic currents, and gives negative feedback to v. After 
the spike reaches its apex at +30 mV (not to be confused with the firing 
threshold) the membrane voltage and the recovery variable are reset according 
to Eq. (1341) . The variable / accounts for the inputs to the neurons [4"Tl4"6] . 

Since we cannot simulate an infinite-dimensional system on a finite-dimensional 
lattice, we choose a network with a finite-dimensional approximation taking a 
time resolution of 1 ms. The network consists of iV = 1000 neurons with 
the first Ne = 650 of excitatory RS type, and the remaining Ni = 350 
of inhibitory FS type |47J. Each excitatory neuron is connected to M — 
2,3,20,40,60,80,100,120,140 and 160 random neurons, so that the probabil- 
ity of connection is M/N = 0.002, 0.003,0.02,0.04,0.06,0.08,0.1,0.12,0.14 and 
0.16. Each inhibitory neuron is connected to M = 2,3,20,40,60,80,100,120,140 
and 160 excitatory neurons only. Synaptic connections among neurons have 
fixed conduction delays, which are random integers between 1 ms and 20 ms. 

The main idea of this simulation is to investigate Amari's hypothesis that 
correlations of almost all orders are needed to realise the widespread distribu- 
tion of firing when a large number of neurons is considered and to show the 
behaviour of the parameter q when the number of interconnected neurons is 
changed. 

In order to show that the probability distribution in the thermodynamic limit 
is not realised even when pairwise interactions, or third-order interactions, 
exist, we estimate the joint probability distribution of firing when each exci- 
tatory/inhibitory neuron is interconnected to M=2 and 3 neurons. We run 10 
minutes of simulated spiking activity, considering a window length of AT = 25 
ms, which is a time window close to the one used by [43|4^f2lfT] to investigate 
the effect of correlations. As shown in Fig 1. 10b q = 1 is a remarkably good 
fit to the simulated distribution of firing (p — value < 0.05 ) when each ex- 
citatory/inhibitory neuron is connected to M = 2, 3 random neurons. Notice 
that we find no difference in the probability distribution of firing when M=2 
and M=3 are considered. However, when the number of interconnected neu- 
rons becomes higher, the distribution of firing becomes more widespread (see 
Fig l.lOfc ) and q increases as the number of interconnected neurons increases 
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(Fig l.lip . In the current simulation we took 1000 neurons, however, when we 
chose 100-200 neurons, we were also able to capture the effect of the higher- 
order correlation if the parameter M was large enough to produce a widespread 
distribution (see Appendix, Section C). 



4 Discussion and conclusions 

Approaches using binary maximum entropy models at a pairwise level have 
been developed considering a very large number of neurons on short time 
scales 1 1 .2.29 . These models can capture essential structures of the neural 
population activity, however, due to their pairwise nature their generality has 
been subject to debate [48 11491)10] . In particular, using an information geomet- 
rical approach, E. Ohiorhenuan and J. D. Victor have shown the importance 
of the triplets to characterise scale dependence in cortical networks. They in- 
troduced a measure called "strain" that quantifies how a pairwise-only model 
must be "forced" to accommodate the observed triplet firing patterns |10j . 
Thus, although models accounting for pairwise interactions have proved able 
to capture some of the most important features of population activity at the 
level of the retina [Tf2] . pairwise models are not enough to provide reliable 
descriptions of neural systems in general |4~8f49|ll0f50f51] . 




Very little is known about how the information saturates as the number of 
neurons increases. It has been pointed out by Amari and colleagues [12] that 
as the number of neurons increases, pairwise or triplet-wise correlations are 
not enough to realise a widespread distribution of firing. Understanding how 
neural information saturates as the number of neurons increases would require 
the development of an appropriate mathematical framework to account for 
correlations higher than two in the thermodynamic limit. 

It is important, therefore, to develop an appropriate mathematical approach 
to investigate systems with a large number of neurons, which could account 
for correlations of almost all orders within the distribution of firing. 

In this paper we present a theoretical approach to quantify the extent of 
higher than pairwise spike correlation in pools of multiunit activity when 
taking the limit of a very large number of neurons. In order to do this, we take 
advantage of recent mathematical progress on g-geometry to investigate, in 
the asymptotic limit, the effect of higher-order correlations on the probability 
distributions within the ECLT framework |21|22|23|l24f25f26] . The main basis 
of our formalism is that when taking the limit of a very large number of 
neurons within the framework of the CLT as in [12], we are losing information 
about higher-order correlations. Thus, in the new theoretical approach we 
take the limit of a very large number of neurons within the framework of the 
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ECLT, instead of the CLT. The inclusion of a deformation parameter q in the 
ECLT framework allows us to reproduce remarkably well the experimental 
distribution of firing and to avoid the sampling size problem of Eq. due to 
the exponentially increasing number of parameters. 

We estimated the normalised firing distribution Q q (r), from multiunit record- 
ings, and fitted the parameter q to find the best-fitting function Q q {r) (as in 
Eq. fl29|) ). We showed by simple comparison how statistically different from the 
q — 1 case the measured distribution is (as it is reduced to the CLT pairwise 
estimations of [12] )• Our theoretical predictions provided a remarkably good 
fit for the experimental distribution. We showed that the parameter q > 1 
can capture higher-order correlations, which are salient features of the distri- 
bution of firing, when applied to our multiunit recording data obtained from 
mouse barrel cortex. As higher-order correlations were present in the data, the 
distributions in the CLT framework do not fit the experimental data well. 

Staude and collaborators [52|53|I54||55] have introduced a quite powerful ap- 
proach based on continuous-time point process to investigate higher-order cor- 
relations in non-Poissonian spike trains. Higher-order interactions are very im- 
portant to investigate the neuronal interdependence in the cortex and at pop- 
ulation level of the neuronal avalanches [27]. More specifically, Plenz and col- 
laborators [27] have developed a very powerful theoretical framework based on 
a Gaussian interaction model that takes into account the pairwise correlations 
and event rates and by applying a intrinsic thresholding permit distinguish- 
ing higher order interactions. In this approach the pattern probabilities for 
the so-called "DG model" were estimated using the cumulative distribution of 
multivariate Gaussians and showed a high fitting precision of the experimental 
data. Our current theoretical formalism relies on a different basis: the recent 
progress made on the ECTL, and the main goal of our approach, is to provide a 
quantification of the amount of correlations higher than two when considering 
a large population of neurons. Thus using mathematical tools of non extensive 
statical mechanics [38,24,25,26,20,32,12,33,22,23], we introduced an approach 
that provides a quantification of the degree of higher order correlation for a 
very large number of neurons. 

It would be interesting to develop in the close future a paper showing 
ful comparison of our current method with the one developed by Plenz and 
collaborators. This would help to investigate the possible link of the quantifi- 
cation through the q parameter with the "DG model" [27] and to gain more 
insights for future analysis. Evaluating the degree of q can help to understand 
further the processing of information in the cortical network and to get more 
understanding of the non-linearities of information transmission within a neu- 
ronal ensemble. Although formally speaking the multiunit recordings are not 
in the "thermodynamic limit", the current methodology presented in this pa- 
per is a completely new theoretical approach for theoretical neuroscience. And 
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as more data become available for a very large number of neurons, i.e. includ- 
ing evoked activity to sensory stimuli, our theoretical approach could provide 
an important mathematical tool to evaluate important questions such as how 
quickly neural information saturates as the number of neurons increases, and 
whether it saturates at a level much lower than the amount of information 
available in the input. 

The pooling process can be taken to reflect the behaviour of a simple integrate- 
and-fire neuron model in reading out the activity of an ensemble of neurons, or 
alternately the recording of multiunit neural activity, without spike sorting. As 
we cannot know what the exact number of neurons in our multiunit recordings 
is, we approximated this number by the maximum number of spikes. In the 
pooling approach we developed within the ECLT framework we assumed the 
asymptotic limit of a very large number of neurons. Our method was devel- 
oped within the information geometry framework, similar to that described in 
|10j . However, it is important to remark three important differences between 
the method developed in [10] and our current theoretical approach: first, we 
assumed homogeneity across neurons; second, our approach was developed 
assuming a very large number of neurons within the ECLT framework, and 
third, our approach accounted for cases in which a widespread probability 
distribution is not realised even when pairwise interactions, or third-order 
interactions, exist. The significance of our approach is that it allows us to ex- 
tend analytically solvable models of the effect of correlations higher than two 
and to compute their scaling properties in the case of a very large number 
of neurons. Our treatment provides a quantification of the degree of corre- 
lation within the probability distribution, which is summarised in a single q 
parameter, thus avoiding the sample size problem that constitutes the main 
intractability reason of the approaches presented in Eq. ([6]). 

To contrast our current data analysis with a case in which the network struc- 
ture is known a priori, we tested our theoretical approach using a network 
simulation model in which the number for interconnected neurons is a param- 
eter under control. We then analysed a simple network model in which neurons 
receive common overlapping inputs and considering that each neuron can be 
interconnected randomly with more than two neurons. We showed that in 
the specific network model, high connectivity is required to get a widespread 
distribution, which is in agreement with the hypothesis of Amari [12j that 
weak higher-order interactions of almost all orders are required for realising a 
widespread activity distribution in the "thermodynamic limit" [12] . Moreover, 
our results are in agreement with the hypothesis of Amari and colleagues [T2] 
that the widespread probability distribution in the thermodynamic limit is 
not realised even when pairwise interactions, or third-order interactions, ex- 
ist. Correlations of almost all orders are then needed to realise the widespread 
activity distribution of a very large population of neurons. In our current sim- 
ulation we took 1000 neurons, which may be considered quite a large number 
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in comparison to the number of neurons that a multiunit recording might cap- 
ture. However, when we chose 100-200 neurons in our simulation, we were also 
able to capture the effect of the higher-order correlation if the number of inter- 
connected neurons was large enough to produce a widespread distribution (see 
Appendix, section C). Thus, as higher-order correlations were present in the 
simulated data set, a very weird network architecture will be required to force 
q — 1, and thus in this case the distribution of firing in the CTL framework 
does not come even close to fitting the simulated data. 

The model we developed using an information - geometric approach within 
the ECLT framework, and together with Fisher information estimations, in 
principle could allow population codes involving higher-order correlations to 
be studied in the thermodynamic limit, in the same way as other authors 
have done for the second order case [56|42|57j . This is particularly interesting 
as in most cases pairwise models do not provide reliable descriptions of true 
biological systems [^5] . Thus our approach could be of help to gain further 
insights into the role of high-order correlations in information transmission 
for very large systems, and could also be an important mathematical tool 
to evaluate whether the evoked activity may induce plasticity effects on the 
network when compared to the spontaneous signal. Applying our formalism to 
a data set obtained from mouse barrel cortex using multiunit recordings, we 
showed that a simple estimation of the deformation parameter q attached to 
the probability distribution of firing can answer us how significant the degree 
of higher-order spike correlations is within pools of neural activity. In our 
current analysis we show that higher - order correlations are prevalent but 
they do not, in general, improve the accuracy of the population code when 
considering a large number of neurons. Overall our findings show that Fisher 
information increases as the time window decreases, which would involve an 
easier discrimination task when the time windows become shorter. 

Summarising, we presented an information-geometric approach to quantify 
the degree of spike correlations higher than two in pools of multiunit neu- 
ral activity. Our proposed formalism provides a statistical characterisation of 
the amount of high-order correlations through the q parameter, avoiding the 
sampling problem of the pooling approach when high-order correlations in the 
thermodynamic limit are considered. 



5 Materials and methods 

Recordings were made from adult female C56BL/6 mice, of 2-3 months of 
age. The animals were maintained in the Imperial College animal facility 
and used in accordance with UK Home Office guidelines. The experiments 
were approved by the UK Home Office, under Project License 70/6516 to 
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S. R. Schultz. Mice were sedated by an initial intraperitoneal injection of 
urethane (1.1 g/kg, 10 % w/v in saline) followed by an intraperitoneal in- 
jection of 1.5 ml/kg of Hypnorm/Hypnovel (a mix of Hypnorm in distilled 
water 1:1 and Hypnovel in distilled water 1:1; the resulting concentration be- 
ing Hypnorm:Hypnovel:distilled water at 1:1:2 by volume), 20 minutes later. 
Atropine (1 ml/kg, 10 % in distilled water) was injected subcutaneously. Fur- 
ther supplements of Hypnorm (1 ml/kg, 10 % in distilled water) were admin- 
istered intraperitoneally if required. Their body temperature was maintained 
at 37 ± 0.5°C with a heating pad. A tracheotomy was performed and an en- 
dotracheal tube (Hallowell EMC) was inserted to maintain a clear airway as 
previously described [58]. After the animal was fixed on stereotaxic frame, a 
craniotomy was performed, aimed at above barrel C2. A small window on the 
dura was exposed to allow insertion of the multi-electrode array. The exposed 
cortical surface was covered with artificial cerebrospinal fluid (in mM: 150 
NaCl, 2.5 KC1, 10 HEPES, 2 CaC12, 1 MgC12; pH 7.3 adjusted with NaOH) 
to prevent drying. The linear probe (model: Alxl6-3mm-50-413, NeuroNexus 
Technologies) was lowered into the brain perpendicularly to the cortical sur- 
face, until all 16 electrode sites were indicating multiunit neural activity, and 
allowed to settle for 30 minutes before recording began. 



Acknowledgments 

Research supported by PIP 0255/11 (CONICET), PIP 1177/09 (CONICET) 
and Pict 2007-806 (ANPCyT), Argentina (FM-MP), BBSRC DTA studentship 
(EP) and EPSRC EP/E002331/1, UK (SRS). 



Appendix 

A. Estimation of F q (e) and Q q (r) 

We take the limit of N ->• oo, in the "ECLT framework" [22l23T2^1l25ll26l 
instead of the Gaussian CLT as considered in [12J, we can define: 



F q (e) = P r {u > 0|e) = P r ( Ui > 



h — yfae 



(A-l) 



\/l - a 
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Notice that if we consider the limit of the CLT (q = 1), the previous expression 
reduces to 



^ / \ 1 ^ „ / 1 h — \fae . . . . 

F,,M = -Erfc^^). (A-2) 

where Erfc(x) = ^= f£° exp(—t 2 )dt denotes the complementary error function. 
However, if the effect of higher order correlations than two are not negligible 
then according to the ECTL q > 1, thus 

ex Pg (-y) = f ^y J o dtt— - 1 exp (-t-t(g-l)_) (A-3) 

which is known as Hilhorst transform [37J, an integral representation widely 
used in generalized statistical mechanics [38] ■ Thus F q (e) reads 

F fl ( e ) = i _y J^dtdvt-^eM-t + tiq-l)-) (A-4) 

V Q — 1 / — a 



and substituting w = \J 2 v i we can rewrite 



J h—\/a£ 



dv exp ( — t — t(q — 1) — 



exp (-£) ^ 



2 



Erf 



lt(q — 1) (h — ^/ae^ 
(A-5) 



where Erf(x) denotes the error function, 



Erf(x) 



7T 



I PX 

= / exp(-t 2 )dt 

7T JO 



(A-6) 



and 



1 f°° 

T(-,x 2 ) = 2 exp(-t 2 )dt. (A-7) 

2 ii 

Thus, using 



f°° a 1 T(/i + v) (3 
exp(— T(z/, ax)dx = —. . 2-P\(l, A* + ^ A* + 1; 
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in Eq. (lA-lj) where Re(a + 0) > 0,Re(/j,) > 0, Re{p, + v) > and 2 F\ denotes 
the Gauss hypergeometric function, we can derive a compact expression for 



F q as 



W = e(e) 2 F 1 (l,-l T ;-l T + i; rT i- M ) (A . 8) 
where we named 



C(e) = ^ - (A-9) 

and 

Ue) • (A-10) 

Making use of the fact that the hypergeometric functions accomplish the fol- 
lowing identity 

2 F 1 (a, b; c; z) = 2 F 1 (b, a; c; z) = (1 - zf" x - h 2 F 1 {c -a,c-b;c; z) (A-ll) 
(see |59|), we can name: a = 1; 6 = — = — ^ + ~\z = rrm and therefore 

v f ' q— 1' q-l 2' l+^o (e) 



g-l'g-l 2'l + £ (e) y V l + eo(^) y V ?-l 2'2'g-l 2' 1 + CoC^) 

(A-12) 

Then Eq. ( 1A-8j) reads as, 



/ x 1 „ , 1 1 1 1 1 1 

F q {e) = — ; 7—7=^ — — ~ i 2Ft{— — - -, -; — — + 



2^(^-1)^(1 + ^))^ V ?-! 22 V1 2'l + e (e) 

(A-13) 

Notice that the incomplete beta function [59] is defined as 

B w (p,q)= / t p - 1 (l-tr 1 ^ = -^ 2 F 1 (p,l-g;p + l, U ;) (A-14) 
Jo p 

and therefore naming p = — \,q= \ and w = 1+ ^ ( £ ) we can rewrite 

11 1 111111 
B_ i ( r, -) = — — - i — r 2-ri( r - -, -; r + -; 



i+^g-1'2 7 i)( 1 + 6(e ))^T-^ lv g-l 2'2'g-l 2' 1 + £„(£)'" 

(A-15) 
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This allows us to rewrite F q (e) in terms of a beta incomplete function with 
dependence on q, as 



FJe) 



1 ,B > ' 1 1 



2v / 2^F J^- 1+5oW 9-12' 



(A-16) 



The distribution of firing is approached as 



Q q (r)^NP r {r = -} 



NE E { 



N 



(A-17) 



Using that, 
can write 



iV 



exp(-JVrIog(r)-JV(l-r)log(l-r)) ^ ^ ^ q£ , ^ ^ 

y/2nNr(l-r) 



Q q (r) 



N 



(2vr) 2 r(l -r) 



F M 1 - F (e 

exp{N[r log(-^) + (1 - r) log ' 



e 



(A-18) 



1 — r 2 



But notice that, in the definition of Q q {r) the standard exponential is used 
since the correlation effects were previously included through the deformation 
parameter q within F q (e) (where we have used the ECLT [23]). The previous 
integral (Eq. IA-18j) is solved using the saddle point approximation [39fT2] , and 



then Eq. (129]) is obtained. The goodness of the fit was evaluated estimating 
the normalised mean squared error (NMSE), p — value < 0.05. It is then 
fitted the q parameter to get the best fitting function Q q (r) in Eq. (1251) for the 
experimental distribution. We used the matlab subroutine GFIT2 to compute 
goodness of fit, for regression model, given matrix/vector of target and output 
values. 



B. The Gaussian Distribution within the q-Information Geometry framework 
The q-Gaussian distribution is defined as: 



(x-n) 



2 



p(x, fi, a) = exp ? ( — (B-l) 
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which can be rewritten as: 



p(x,n,<r) = exp q (—x 



log q (V2n)). 



(B-2) 



2a 2 2a 2 



where can identify the correlation coordinates as 




(B-3) 



-1 



(B-4) 



2a 2 ' 



and the factor 



+ log q (V27T<j) 



(B-5) 



2a 2 



C. Simulation 

We applied our formalism to a network simulation model in which the num- 
ber of neurons is a parameter under control. In the following que consider a 
network of N = 200 neurons with the first Ne = 80 of excitatory RS type, and 
the remaining Ni=120 of inhibitory FS type [57]. Each excitatory neuron is 
connected to M=80 random neurons, and each inhibitory neuron is connected 
to M=80 excitatory neurons only. Synaptic connections among neurons have 
fixed conduction delays of 10 ms. Notice that weak higher-order interactions of 
almost all orders are required for realising a widespread activity distribution 
of a large population of neurons [12], this is in agreement with the hypothesis 
of Amari a [12] that the widespread probability distribution in the thermo- 
dynamic limit is not realised even when pairwise interactions, or third-order 
interactions, exist (see Fig. IC.ll ). 
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(a) (b) 




Fig. .1. Schematic representation of spike correlations, (a) Spikes being fired at a 
given time window of size At considering a population of N neurons. (b)E(ai), E(a,2) 
and E(as) are the family of distributions having the same correlation coordinates 
Oi, ci2, and 03, respectively. The family of all probability distributions belongs to 
the (/-exponential family of distributions for any q, and thus we can introduce the 
(/-geometrical structure to any arbitrary family of probability distributions. 
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Fig. .2. E(a\), E{a2) and E{a^) are the family of distributions having the same 
correlation coordinates a\, 02, and 03, respectively. The family of all probability 
distributions belongs to the (/-exponential family of distributions for any q, and thus 
we can introduce the ^-geometrical structure to any arbitrary family of probability 
distributions. 
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Fig. .3. Schematic representation of a neuronal pool with N neurons that receives 
si, S2, ■■■s m common inputs. 
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Fig. .4. Distribution of firing Q q {r) computed for q = 1 and q = 1.3 (semi-log in the 
X axis). 
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Fig. .5. Parameter q estimated through the fit of the firing distribution Q q (r) for 
the three different data sets, (a-c) AT=25 ms; (d-f) AT = 50 ms; (g-i) AT = 
100 ms (maximum observed q: black diamond in panel i, channel 15); and in (J-l) 
the time windows is AT =200ms (minimum observed q: black diamond in panel I, 
channel 12). The goodness of the fit was evaluated by estimating the normalised 
mean squared error (NMSE), p — value < 0.05, see Appendix. 
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Fig. .6. Fit of Q q {r) as a function of the normalised firing rate r (semi-log in the 
X axis), (a) AT=25 ms {N max = 130), g = 1.55 and Fisher Info, equal to 0.1026 
bits, b) AT=50 ms (N max = 160), q = 1.5242 and Fisher info, equal to 0.0489 bits, 
(c) AT=100 ms (Nmax = 200), q = 1.5483 and Fisher info, equal to 0.0151 bits. 
Id) AT=200 ms ( N max = 240), q = 1.3858 and Fisher info, equal to 0.0204 bits, 
which corresponds to the minimum observed q (Fig [31 panel I: channel 12, diamond 
symbol). The goodness of the fit was evaluated by estimating the normalised mean 
squared error (NMSE), p— value < 0.05. The theoretical approach considering q > 1 
(black dashed line) shows a remarkably good fit, in comparison to the case q = 1 
(grey dashed line), to the experimental curves (grey circles joined by dotted lines). 
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Fig. .7. Fit of Q q (r) as a function of the normalised firing rate r (semi-log in the 
X axis), (a) AT=25 ms (AW, = 130), q = 1.604 and Fisher info, equal to 0.0905 
bits, (b) AT=50 ms (N max = 140), q = 1.6 and Fisher info, equal to 0.0395 bits, 
(c) AT=100 ms (N max = 180), q = 1.6858 and Fisher info, equal to 0.0142 bits, 
which corresponds to the maximum observed q (Fig [3] panel i: channel 15, diamond 
symbol), (d) AT=200 ms {N max = 240 ), q = 1.4478 and Fisher info, equal to 
0.0150 bits. Same curve labels as in Fig [761 
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Fig. .8. Fisher information versus q (theoretical approach considering q > 1), taking 
four different time windows and considering all the 48 available channels. Black up- 
triangles, AT=25 ms; dark down-triangles, AT=50 ms; dark grey squares, AT=100 
ms; light grey circles, AT= 200 ms. 
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Fig. .9. (a) Fisher information averaged over the 48 different channels (theoretical 
approach considering q > 1). Black bar, AT=25 ms; dark grey bar, AT= 50 ms; 
grey bar, AT=100ms; light grey bar, AT=200ms. (b) Same as in B but considering 
the fit with q constrained to be equal to 1. 
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Fig. .10. Distribution of firing Q q (r) versus the normalised firing rate r (semi-log in 
the X axis), using 10 min of a simulated network that consists of N = 1000 neurons 
with the first Ne = 650 of excitatory RS type, and the remaining Ni = 350 of 
inhibitory FS (AT=25 ms). (a) Considering that each excitatory/inhibitory neuron 
is connected to M = 2 or 3 random neurons. We find no difference in the probability 
distribution of firing when M = 2 or M = 3 is considered, q = 1. (6) Considering 
that the number of randomly interconnected neurons is M = 80, q = 1.4252. The 
goodness of the fit was evaluated by estimating the normalised mean squared error 
(NMSE), p — value < 0.05. 
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Fig. .11. Parameter q estimated through the fit of the firing distribution Q q (r), ver- 
sus the number of interconnected neurons M. The goodness of the fit was evaluated 
estimating the normalised mean squared error (NMSE), p — value < 0.05. 



41 



Simulation 

■ q=1.4315 

■ q=l 




semi-log (r) 



Fig. C.l. Distribution of firing Q q (r) using 10 min of simulated network which 
consists of N = 200 neurons with the first Ne = 80 of excitatory RS type, and the 
remaining Ni=120 of inhibitory FS (AT=25 ms). Considering that the number of 
randomly interconnected neurons is M=80, q = 1.4315. The goodness of the fit was 
evaluated estimating the normalised mean squared error (NMSE), p — value < 0.05. 
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